首页> 外文OA文献 >Combinatorial Network Optimization with Unknown Variables: Multi-Armed Bandits with Linear Rewards

【2h】

Combinatorial Network Optimization with Unknown Variables: Multi-Armed Bandits with Linear Rewards

机译：具有未知变量的组合网络优化：多臂带有线性奖励的强盗

代理获取

本网站仅为用户提供外文OA文献查询和代理获取服务，本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文，但由于OA文献来源多样且变更频繁，仍可能出现获取不到、文献不完整或与标题不符等情况，如果获取不到我们将提供退款服务。请知悉。

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

In the classic multi-armed bandits problem, the goal is to have a policy fordynamically operating arms that each yield stochastic rewards with unknownmeans. The key metric of interest is regret, defined as the gap between theexpected total reward accumulated by an omniscient player that knows the rewardmeans for each arm, and the expected total reward accumulated by the givenpolicy. The policies presented in prior work have storage, computation andregret all growing linearly with the number of arms, which is not scalable whenthe number of arms is large. We consider in this work a broad class ofmulti-armed bandits with dependent arms that yield rewards as a linearcombination of a set of unknown parameters. For this general framework, wepresent efficient policies that are shown to achieve regret that growslogarithmically with time, and polynomially in the number of unknown parameters(even though the number of dependent arms may grow exponentially). Furthermore,these policies only require storage that grows linearly in the number ofunknown parameters. We show that this generalization is broadly applicable anduseful for many interesting tasks in networks that can be formulated astractable combinatorial optimization problems with linear objective functions,such as maximum weight matching, shortest path, and minimum spanning treecomputations.

机译：在经典的多臂匪徒问题中，目标是制定一项动态操作武器的政策，每项政策都会产生带有未知手段的随机奖励。兴趣的关键指标是后悔，定义为无知的，知道每条手臂的奖励手段的无所不能的玩家积累的预期总奖励与给定策略积累的预期总奖励之间的差距。先前工作中介绍的策略的存储，计算和后悔都随着分支数量的增加而线性增长，而分支数量很大时就无法扩展。我们在这项工作中考虑了带有依赖臂的一类广泛的多臂匪，它们以一组未知参数的线性组合形式产生回报。对于此通用框架，我们提出了有效的策略，这些策略显示出可以实现遗憾，该遗憾会随着时间呈对数增长，并且未知参数的数量呈多项式增长（即使从属分支的数量可能呈指数增长）。此外，这些策略仅要求存储的未知参数数量呈线性增长。我们表明，这种概括适用于网络中的许多有趣任务，这些任务可以用线性目标函数（例如最大权重匹配，最短路径和最小生成树计算）表达为可组合的优化问题。

著录项

作者
Gai, Yi; Krishnamachari, Bhaskar; Jain, Rahul;
展开▼
作者单位

展开▼
年度 2010
总页数
原文格式 PDF
正文语种 {"code":"en","name":"English","id":9}
中图分类

相似文献

外文文献
中文文献
专利

1. Combinatorial Network Optimization With Unknown Variables: Multi-Armed Bandits With Linear Rewards and Individual Observations [J] . Gai Y., Krishnamachari B., Jain R. Networking, IEEE/ACM Transactions on . 2012,第5期

机译：未知变量的组合网络优化：具有线性奖励和个人观察力的多臂土匪
2. IEEE 802.15.4.e TSCH-Based Scheduling for Throughput Optimization: A Combinatorial Multi-Armed Bandit Approach [J] . Javan Nastooh Taheri, Sabaei Masoud, Hakami Vesal IEEE sensors journal . 2020,第1期

机译：IEEE 802.15.4.基于TSCH的吞吐量优化调度：组合多武装强盗方法
3. Stochastic online optimization. Single-point and multi-point non-linear multi-armed bandits. Convex and strongly-convex case [J] . Gasnikov A. V., Krymova E. A., Lagunovskaya A. A., Automation and Remote Control . 2017,第2期

机译：随机在线优化。单点和多点非线性多武装匪徒。凸和强凸案
4. Combinatorial Multi-Armed Bandit with General Reward Functions [C] . Wei Chen, Wei Hu, Fu Li, Annual conference on Neural Information Processing Systems . 2016

机译：具有一般奖励功能的组合式多武装匪
5. Online learning algorithms for network optimization with unknown variables. [D] . Gai, Yi. 2012

机译：用于使用未知变量进行网络优化的在线学习算法。
6. Gateway Selection in Millimeter Wave UAV Wireless Networks Using Multi-Player Multi-Armed Bandit [O] . Ehab Mahmoud Mohamed, Sherief Hashima, Abdallah Aldosary, 2020

机译：使用多播放器多武装强盗的毫米波无线网络中的网关选择
7. On the Combinatorial Multi-Armed Bandit Problem with Markovian Rewards [O] . Yi Gai, Bhaskar Krishnamachari, Mingyan Liu 2011

机译：具有马尔可夫奖励的组合式多武装强盗问题

Combinatorial Network Optimization with Unknown Variables: Multi-Armed Bandits with Linear Rewards

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅